25 research outputs found

    Room geometry inference using sources and receivers on a uniform linear array

    Get PDF
    State-of-the-art room geometry inference algorithms estimate the shape of a room by analyzing peaks in room impulse responses. These algorithms typically require the position of the source wrt the receiver array; this position is often estimated with sound source localization, which is susceptible to high errors under common sampling frequencies. This paper proposes a new approach, namely using an array with a known geometry and consisting of both sources and receivers. When these transducers constitute a uniform linear array, new challenges and opportunities arise for performing room geometry inference. We propose solutions designed to address these challenges, but also designed to leverage the opportunities for better results

    Listening Tests with Individual versus Generic Head-Related Transfer Functions in Six-Degrees-of-Freedom Virtual Reality

    Get PDF
    Individual head-related transfer functions (HRTFs) improve localization accuracy and externalization in binaural audio reproduction compared to generic HRTFs. Listening tests are often conducted using generic HRTFs due to the difficulty of obtaining individual HRTFs for all participants. This study explores the ramifications of the choice of HRTFs for critical listening in a six-degrees-of-freedom audio-visual virtual environment, when participants are presented with an overall audio quality evaluation task. The study consists of two sessions using either individual or generic HRTFs. A small effect between the sessions is observed in a condition where elevation cues are impaired. Other conditions are rated similarly between individual and generic HRTFs

    Quality of experience in telemeetings and videoconferencing: a comprehensive survey

    Get PDF
    Telemeetings such as audiovisual conferences or virtual meetings play an increasingly important role in our professional and private lives. For that reason, system developers and service providers will strive for an optimal experience for the user, while at the same time optimizing technical and financial resources. This leads to the discipline of Quality of Experience (QoE), an active field originating from the telecommunication and multimedia engineering domains, that strives for understanding, measuring, and designing the quality experience with multimedia technology. This paper provides the reader with an entry point to the large and still growing field of QoE of telemeetings, by taking a holistic perspective, considering both technical and non-technical aspects, and by focusing on current and near-future services. Addressing both researchers and practitioners, the paper first provides a comprehensive survey of factors and processes that contribute to the QoE of telemeetings, followed by an overview of relevant state-of-the-art methods for QoE assessment. To embed this knowledge into recent technology developments, the paper continues with an overview of current trends, focusing on the field of eXtended Reality (XR) applications for communication purposes. Given the complexity of telemeeting QoE and the current trends, new challenges for a QoE assessment of telemeetings are identified. To overcome these challenges, the paper presents a novel Profile Template for characterizing telemeetings from the holistic perspective endorsed in this paper

    Nonstationary noise PSD matrix estimation for multichannel blind speech extraction

    No full text
    Noise power spectral density (PSD) matrix estimation is one of the most important components of a multichannel blind speech extraction framework, as it largely determines the amount of residual noise at the output of a spatial filter. Optimality of well-known spatial filters, such as the multichannel Wiener filter, is only ensured if the PSD matrices of the noise and the desired speech are accurately estimated. In practical situations, where the noise is non-stationary, temporal averaging over time frames where the desired signal is inactive does not provide sufficiently fast tracking of the noise PSD matrix, resulting in high residual noise at the spatial filter output. Therefore, approaches that estimate the PSD matrices using narrowband signal detection have been proposed. Following the well-known single- and multichannel minima-controlled recursive averaging (MCRA) approaches, in this paper, we focus on narrowband speech presence probability-based noise PSD matrix estimators, which are suitable for blind scenarios where the location and the propagation vector of the desired speech source are unknown. The main contributions of the paper are a maximum likelihood interpretation of the multichannel MCRA, and a coherent-to-diffuse ratio-based apriori speech absence probability (SAP) estimator. The latter is a key parameter that determines the accuracy of the noise PSD matrix estimates in nonstationary scenarios. In this paper, we confirm the importance of the apriori SAP and show that its control is crucial for source extraction in nonstationary environments

    An iterative least-squares design method for filters with constrained magnitude response in sound reproduction

    No full text
    Filter coefficients determined according to a least-squares criterion are frequently used in applications related to sound zones and adjustable directivity. Without further constraints, the obtained filter coefficients can exhibit very large frequency-domain magnitudes whenever the underlying optimization problem is ill-conditioned. To avoid distortion in the loudspeakers during reproduction, the frequency responses of the reproduction filters can be limited in their magnitudes. However, solving the resulting optimization problem is computationally expensive, which constitutes a problem when large filter lengths or a large number of loudspeaker channels are considered. In this contribution, an efficient previously proposed algorithm is modified such that the filter’s magnitude response is constrained. The proposed algorithm uses an approximation in the discrete Fourier-transform domain to yield time-domain filter coefficients. The accuracy of the proposed algorithm is measured by comparing to a state-of-the-art approach for convex optimization considering a free-field scenario. Furthermore, the applicability of the results to real-world scenarios is investigated considering measured impulse responses

    Power-based signal-to-diffuse ratio estimation using noisy directional microphones

    No full text
    The signal-to-diffuse ratio (SDR), which describes the power ratio between the direct and diffuse component of a sound field, is an important parameter in many applications. This paper proposes a power-based SDR estimator which considers the auto power spectral densities obtained by noisy directional microphones. Compared to recently proposed estimators that exploit the spatial coherence between two microphones, the power-based estimator is more robust at lower frequencies given that the microphone directivities are known with sufficiently high accuracy. The proposed estimator can incorporate more than two microphones and can therefore provide accurate SDR estimates independently of the direction-of-arrival of the direct sound. We further propose a method to determine the optimal microphone orientations for a given set of directional microphones. Simulations show the practical applicability

    Practical considerations of time-varying feedback delay networks

    No full text
    Feedback delay networks (FDNs) can be efficiently used to generate parametric artificial reverberation. Recently, the authors proposed a novel approach to time-varying FDNs by introducing a time-varying feedback matrix. The formulation of the time-varying feedback matrix was given in the complex eigenvalue domain, whereas this contribution specifies the requirements for real valued time-domain processing. In addition, the computational costs of different time-varying feedback matrices, which depend on the matrix type and modulation function, are discussed. In a performance evaluation, the proposed orthogonal matrix modulation is compared to a direct interpolation of the matrix entries

    Accurate reverberation time control in feedback delay networks

    No full text
    The reverberation time is one of the most prominent acoustical qualities of a physical room. Therefore, it is crucial that artificial reverberation algorithms match a specified target reverberation time accurately. In feedback delay networks, a popular framework for modeling room acoustics, the reverberation time is determined by combining delay and attenuation filters such that the frequency-dependent attenuation response is proportional to the delay length and by this complying to a global attenuation-per-second. However, only few details are available on the attenuation filter design as the approximation errors of the filter design are often regarded negligible. In this work, we demonstrate that the error of the filter approximation propagates in a non-linear fashion to the resulting reverberation time possibly causing large deviation from the specified target. For the special case of a proportional graphic equalizer, we propose a non-linear least squares solution and demonstrate the improved accuracy with a Monte Carlo simulation

    Cramér-Rao bound analysis of reverberation level estimators for dereverberation and noise reduction

    No full text
    The reverberation power spectral density (PSD) is often required for dereverberation and noise reduction algorithms. In this work, we compare two maximum likelihood (ML) estimators of the reverberation PSD in a noisy environment. In the first estimator, the direct path is first blocked. Then, the ML criterion for estimating the reverberation PSD is stated according to the probability density function of the blocking matrix (BM) outputs. In the second estimator, the speech component is not blocked. Instead, the ML criterion for estimating the speech and reverberation PSD is stated according to the probability density function of the microphone signals. To compare the expected mean square error (MSE) between the two ML estimators of the reverberation PSD, the Cramér-Rao Bounds (CRBs) for the two ML estimators are derived. We show that the CRB for the joint reverberation and speech PSD estimator is lower than the CRB for estimating the reverberation PSD from the BM outputs. Experimental results show that the MSE of the two estimators indeed obeys the CRB curves. Experimental results of multimicrophone dereverberation and noise reduction algorithm show the benefits of using the ML estimators in comparison with another baseline estimators

    Multispeaker LCMV beamformer and postfilter for source separation and noise reduction

    No full text
    The problem of source separation and noise reduction using multiple microphones is addressed. The minimum mean square error (MMSE) estimator for the multispeaker case is derived and a novel decomposition of this estimator is presented. The MMSE estimator is decomposed into two stages: first, a multispeaker linearly constrained minimum variance (LCMV) beamformer (BF); and second, a subsequent multispeaker Wiener postfilter. The first stage separates and enhances the signals of the individual speakers by utilizing the spatial characteristics of the speakers [as manifested by the respective acoustic transfer functions (ATFs)] and the noise power spectral density (PSD) matrix, while the second stage exploits the speakers\u92 PSD matrix to reduce the residual noise at the output of the first stage. The output vector of the multispeaker LCMV BF is proven to be the sufficient statistic for estimating the marginal speech signals in both the classic sense and the Bayesian sense. The log spectral amplitude estimator for the multispeaker case is also derived given the multispeaker LCMV BF outputs. The performance evaluation was conducted using measured ATFs and directional noise with various signal-to-noise ratio levels. It is empirically verified that the multispeaker postfilters are beneficial in terms of signal-to-interference plus noise ratio improvement when compared with the single-speaker postfilter
    corecore